library(dplyr)
library(naniar)
library(pcalg)
library(ggplot2)
library(dagitty)
library("splines")
First, let’s have a look at the data set.
country: recipient countryyear: yearUS_ODA: U.S. Official development assistance(treatment), defined as government aid designed to promote the economic development and welfare of developing countries[1].US_OOF: U.S. Other Official Flows, transactions by the official sector with countries on the List of Aid Recipients which do not meet the conditions for eligibility as Official Development Assistance or Official Aid.US_aid_total: U.S. total aid the recipient received.Rest_of_world_ODA: ODA except for U.S. and China(data of China is largely missing) the recipient received.Rest_of_world_OOF: OOF except for U.S. and China(data of China is largely missing) the recipient received.Rest_of_world_aid_total: rest of world total recipient received.polity2:Recipient Polity Score, "-10 (hereditary monarchy) to +10 (consolidated democracy)"oda_gni: ODA in percentage in gross national income of the recipient's.population:the total population of the recipient in that year.recipient_trade_world:Recipient Total Trade with WorldGDP_per_growth_rate: Recipient's GDP Per Capita growth rate, calculated by (this year - previous year)/previous year.GDP_total_growth_rate: Recipient's total GDP growth rate, calculated by (this year - previous year)/previous year.population_growth_rate: Recipient's Population growth rate, calculated by (this year - previous year)/previous year.## Warning: NAs introduced by coercion
## X country year US_ODA US_OOF US_aid_total Rest_of_world_ODA
## 1 0 Afghanistan 2000 3.211893 0.0000000 3.211893 113.9824
## 2 1 Afghanistan 2001 9.987947 0.0000000 9.987947 431.5182
## 3 2 Afghanistan 2002 470.108307 0.0000000 470.108307 913.8090
## 4 3 Afghanistan 2003 609.058533 0.8149289 609.873474 938.7944
## 5 4 Afghanistan 2004 949.496277 1.8299663 951.326233 1235.8495
## 6 5 Afghanistan 2005 1557.600098 24.0675926 1581.667725 1139.0460
## Rest_of_world_OOF Rest_of_world_aid_total polity2 oda_gni population
## 1 -0.1459959 113.8364 -7 NA 20.09376
## 2 -8.1849251 423.3333 -7 16.67001 20.96646
## 3 -35.2187729 878.5902 -7 31.79141 21.97992
## 4 -4.6263351 934.1680 -7 34.82247 23.06485
## 5 -36.8433380 1199.0061 -7 43.65011 24.11898
## 6 -57.4928169 1081.5532 -7 45.14558 25.07080
## recipient_trade_world GDP_per_growth_rate population_growth_rate
## 1 1743.062 NA NA
## 2 2288.148 NA 0.04162397
## 3 3263.558 0.36601500 0.04610845
## 4 2814.666 0.03586068 0.04703817
## 5 3027.984 0.06804592 0.04370528
## 6 3372.937 0.09596804 0.03796530
## GDP_total_growth_rate
## 1 NA
## 2 NA
## 3 NA
## 4 8.832278
## 5 1.414118
## 6 11.229715
First, let's visualize the distribution of the missing data.
we decide to drop the observations that have missing values for the following reason: 1. "MNAR: Missing Not at Random - the missing is not random, it correlates with unobservable characteristics unknown to a researcher." The missing rows gather around the last 400 rows, so we cannot assume the missing is at random. 2. we can afford to drop 7.7% of the original data as we still have 92.3% data left.
To get away with the time correlated concern, we decide to base our analysis on 2007 and 2012 because they have the least missing data. We conduct analysis on one of them, and use the result of the other one to compare, which serves as a part of sensitive analysis to test the robustness of the former.
aid = aid %>%
na.omit(aid)
aid2007 = aid %>%
filter(year==2007)
aid2012 = aid %>%
filter(year==2012)
The causal graph is constructed based on our prior assumptions and common sense:
The outcome(Y: GDP growth rate of the recipient) depends on the national demand, population growth rate, the foreign aid received, potentially its autocracy-democracy index(polity2), and the recipient’s openness (measured by its global trade).
We assume that our treatment, US ODA, affect the outcome via direct effect(the recipient receives the aid) and indirect effect of ODA_GNI. The idea for GNI is that given the same amount of foreign aid, if the aid amounts to a large portion of the recipient’s gross national income, this aid should have a larger impact on the recipient. A similar idea applies to the foreign aid provided by the rest of the world.
We assume the recipient country’s openness to global trading contributes to not only GDP growth but also the amount of aid it’s going to receive. The rationale behind the assumption is the potential economic welfare gains from an aid could strongly motivate the donor. For example, the donor might want to have an impact on, or take a share of, the recipient’s market. Finally, we think it’s reasonable to assume that national demand affects the recipient’s global trade.
The assumption we made polity2 a confounder that has an impact on a wide range of other variables. The rationale behind this assumption is that the donors might also be motivated by enlarging their political impact on the recipient, such as winning over political support.
The causal graph was constructed based on our prior knowledge, which entails that the relationship assumed could be inaccurate. Therefore, we adopt causal discovery algorithm to generate causal graphs which help us identify causal flows.
The core idea of the algorithm is that if conditional on all possible sets of other variables, there remains a significant relationship between variable A and B, then there exists a direct causal association between them. The direction of the arrow is determined by controlling the middle variable for every three connected variables.
alpha is the threshold for the signifiance level. If
alpha = 0: holding an extremely high bar for an estimate of ACE to be counted as significant. No edge will be present in the resulted graph.
alpha = 1: any relationship between nodes is considered to be valid and real. The resulted graph skeleton will be strongly connected.
causalAid = aid2012 %>%
dplyr::select(-X,-country,-year)
suff_stat <- list(C = cor(causalAid), n = nrow(causalAid))
pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.01, skel.method = "stable.fast")
plot(pc_aid, main = "")
## Loading required namespace: Rgraphviz
With a threshold of 0.01, the outcome is entirely seperated from the rest of variables. Therefore, we believe 0.01 is too strict and thus relax it to 0.05.
pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.05, skel.method = "stable.fast")
plot(pc_aid, main = "")
pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.12, skel.method = "stable.fast")
plot(pc_aid, main = "")
Comparing the resulted causal graph with our assumptions, we spot the following causal flow that we didn't identify before:
As we relax the treshold from 0.01 to 0.12, there are a few changes we think might be of interests:
It appears that the way variables affect the outcome is via population growth.
Informed by the causal discovery algorithm, we visualize the relationships that are suggested by the causal discovery algorithm in the below section.
We'd like to see
Since the generated causal graph suggests US_ODA \(\rightarrow\) Rest_of_World_ODA and no arrow point from Rest_of_World_ODA to the outcome, which is counterintuitive. we'd like to see the relationship between US_ODA, Rest_of_World_ODA, and the growth rate outcome.
We were surprised to see the causal graph doesn't discern a significant causal flow from Rest_of_World_ODA to the outcome, so we want to visually check if Rest_of_World_ODA is indeed not a mediator, aka if Rest_of_World_ODA doesn't directly contribute to the outcome.
When we relax the alpha threshold, we observe that polity2 and US_ODA now both contribute to the outcome. This is consistent with our prior knowledge. Therefore, we want to check if we should introduce interaction term for polity2 and US_ODA. Interaction term means given the same US_ODA, if the recipient will have different growth because they have different polity2. In other words, if the recipient will react differently towards the same treatment.
How GDP_total_growth_rate is affected by US_ODA and Rest_of_world_ODA?
ggplot(data = aid2012, aes(x = US_ODA, y = GDP_total_growth_rate)) +
geom_point(aes(color = Rest_of_world_ODA), size = 2) +
geom_smooth(se = FALSE, color = "blue")+
coord_cartesian(xlim = c(0,1000))+
scale_color_gradient(low = "yellow", high = "darkblue")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
From the plot, we can tell U.S. ODA and rest of world ODA are correlated, which verifies the generated causal graph. Looking at the y-axis of the plot, the ligher colored points mainly gathered at the bottom while darker ones take the upper part. We vaguely get a sense that both world ODA and U.S. ODA positively contribute to GDP total growth rate, but we admit that rest of world ODA's contribution(pattern of the color) is unobvious.
Observing the effect of U.S. ODA, we can see that there seem to be an increasing trend between 0-250, and another decreasing trend for US_ODA that's larger than 250. Instead of keep using continuous value for treatment, we decide to cut U.S ODA into two categories: [0,250] and [251,+).
aid2012 <- aid2012%>%
mutate(US_ODA_C = ifelse(US_ODA<250, 0,1))
ggplot(data = aid2012, aes(x = Rest_of_world_aid_total, y = GDP_total_growth_rate)) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,2), method = "lm",
se = FALSE, color = "red"
)+
coord_cartesian(xlim = c(0,2000))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Earlier, we didn't see a causal flow from rest of world aid to the outcome, which is implausible from common sense. However, the seemingly positive slope of the bluel line suggests that greater Rest_of_world_ODA_total has slightly higher growth rate, although it's clearly insignificant. However, we are still justified to believe rest of world aid might open a causal path: US_ODA \(\rightarrow\) rest of world aid\(\rightarrow\) GDP_total_growth_rate.
ggplot(data = aid2012, aes(x = US_ODA, y =polity2 )) +
geom_point(aes(color = GDP_total_growth_rate), size = 2) +
scale_color_gradient(low = "yellow", high = "darkblue")+
geom_smooth(se = FALSE, color = "blue")+
coord_cartesian(xlim = c(0,1000))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The colored scatter points don't seem to form a clear pattern. Therefore we DON't assume an interaction between polity2 and US_ODA.
ggplot(data = aid2012, aes(x = population_growth_rate, y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
According to the graph above, we might incorporate population growth rate as an non-linear variable in our model.
ggplot(data = aid2012, aes(x =log(recipient_trade_world), y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,2), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
Since the trade balance of recipient countries is relative big number compared to the foreign aid, we want to use the natural log of the trade balance so that we are able to see numbers on the same scale; also, our analysis would not misled by the big numbers. In this graph, we can see that the natural log of trade balance of recipient countries is non-linear.
ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
mod<-lm(GDP_total_growth_rate~ as.factor(US_ODA_C), data = aid2012)
summary(mod)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5980 -2.1512 -0.0461 2.0177 12.1562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5092 0.3762 11.987 <2e-16 ***
## as.factor(US_ODA_C)1 0.4825 0.9383 0.514 0.608
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.647 on 110 degrees of freedom
## Multiple R-squared: 0.002398, Adjusted R-squared: -0.006671
## F-statistic: 0.2644 on 1 and 110 DF, p-value: 0.6081
confint(mod)
## 2.5 % 97.5 %
## (Intercept) 3.763770 5.254699
## as.factor(US_ODA_C)1 -1.377036 2.341995
mod1 <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod1)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world),
## 2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0556 -1.9284 -0.3254 1.6768 11.0134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3742 3.0684 0.122 0.90318
## as.factor(US_ODA_C)1 -1.4542 0.9608 -1.514 0.13322
## ns(log(recipient_trade_world), 2)1 7.0013 2.6501 2.642 0.00954 **
## ns(log(recipient_trade_world), 2)2 1.4048 1.3955 1.007 0.31646
## ns(polity2, 3)1 -0.5196 1.3877 -0.374 0.70886
## ns(polity2, 3)2 -5.4723 3.6145 -1.514 0.13312
## ns(polity2, 3)3 0.2544 1.0984 0.232 0.81732
## ns(population_growth_rate, 3)1 7.6467 1.5332 4.987 2.52e-06 ***
## ns(population_growth_rate, 3)2 5.0143 4.7620 1.053 0.29483
## ns(population_growth_rate, 3)3 1.0454 3.0372 0.344 0.73141
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.187 on 102 degrees of freedom
## Multiple R-squared: 0.2938, Adjusted R-squared: 0.2315
## F-statistic: 4.715 on 9 and 102 DF, p-value: 3.032e-05
confint(mod1)
## 2.5 % 97.5 %
## (Intercept) -5.711927 6.4602844
## as.factor(US_ODA_C)1 -3.359837 0.4514714
## ns(log(recipient_trade_world), 2)1 1.744870 12.2577906
## ns(log(recipient_trade_world), 2)2 -1.363057 4.1726871
## ns(polity2, 3)1 -3.272090 2.2328938
## ns(polity2, 3)2 -12.641748 1.6970599
## ns(polity2, 3)3 -1.924319 2.4330696
## ns(population_growth_rate, 3)1 4.605608 10.6877901
## ns(population_growth_rate, 3)2 -4.431032 14.4595964
## ns(population_growth_rate, 3)3 -4.978882 7.0696293
According to our model, if the U.S ODA is smaller than 250, then it is expected to contribute to 0.3742 percent GDP growth for the recipient countries in 2012 when we hold other variables as constant; however, if the U.S ODA is greater than 250, then the expected GDP growth rate would decrease by 1.08 percent in 2012 holding other variables constant. Yet, the benefits of U.S ODA is insignificant no matter it is greater or less than 250 in 2012.
results <- data.frame(resid = resid(mod1), fitted = fitted(mod1))
ggplot(results, aes(x = fitted, y = resid)) +
geom_point() +
geom_hline(yintercept = 0)
In the residual plot, we can see that the residuals are assigned relatively evenly across the x-axis; also, the spread is also relatively random. Therefore, we can be confident that there is no heteroskedasticity in our model.
Besides the non-linear model, we also want to investigate the results from the linear model; and we would want to compare the residual plot and the adjusted R squared to see which model we would choose.
linear <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C)+ recipient_trade_world+polity2+population_growth_rate, data = aid2012)
summary(linear)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + recipient_trade_world +
## polity2 + population_growth_rate, data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6338 -2.0881 -0.2703 1.9293 11.4664
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.591e+00 6.847e-01 3.785 0.000254 ***
## as.factor(US_ODA_C)1 -1.153e+00 9.897e-01 -1.165 0.246490
## recipient_trade_world 1.946e-06 2.420e-06 0.804 0.423023
## polity2 -7.298e-03 6.623e-02 -0.110 0.912462
## population_growth_rate 1.179e+02 3.119e+01 3.780 0.000258 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.465 on 107 degrees of freedom
## Multiple R-squared: 0.1239, Adjusted R-squared: 0.09111
## F-statistic: 3.782 on 4 and 107 DF, p-value: 0.006456
results <- data.frame(resid = resid(linear), fitted = fitted(linear))
ggplot(results, aes(x = fitted, y = resid)) +
geom_point() +
geom_hline(yintercept = 0)
We can see that the residual plots of the non-linear model is slightly better than that of the linear model because the residuals spread out more randomly. Also, the adjusted R squared is 0.2315 for the non-linear model which is higher than that of the linear model. Since the performance of the non-linear model is better, we would choose the non-linear model for our analysis and sensitivity analysis as well.
Despite the fact that only U.S ODA can directly connect with the outcome according to our causal discovery, it does not necessarily suggest that other variables can not affect the outcome. Therefore, we want to investigate another potential treatment: total amount of U.S aid; and we want to see its impact on the outcome.
ggplot(data = aid2012, aes(x = US_aid_total, y = GDP_total_growth_rate))+
geom_point()+
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
According to the graph, we might still want to cut the U.S aid total at around 250 because there is an increasing trend between 0 and 250.
aid2012 <- aid2012%>%
mutate(US_aid_total_C = ifelse(US_aid_total<250, 0,1))
p1 <- ggplot(data = aid2012, aes(x = population_growth_rate, y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
According to the graph above, we might incorporate population growth rate as an non-linear variable in our model.
p2 <- ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,1), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
Since the trade balance of recipient countries is relative big number compared to the foreign aid, we want to use the natural log of the trade balance so that we are able to see numbers on the same scale; also, our analysis would not misled by the big numbers. In this graph, we can see that the natural log of trade balance of recipient countries is non-linear.
p3 <- ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate )) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
library("gridExtra")
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(p1,p2,p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
mod2 <- lm(GDP_total_growth_rate~ as.factor(US_aid_total_C) +log(recipient_trade_world)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod2)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_aid_total_C) +
## log(recipient_trade_world) + ns(polity2, 3) + ns(population_growth_rate,
## 3), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0977 -1.9686 -0.2409 1.5751 11.5972
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5556 3.2356 0.172 0.8640
## as.factor(US_aid_total_C)1 -1.0671 0.8860 -1.204 0.2312
## log(recipient_trade_world) 0.4010 0.1927 2.081 0.0399 *
## ns(polity2, 3)1 -0.6691 1.4018 -0.477 0.6341
## ns(polity2, 3)2 -6.5911 3.5954 -1.833 0.0697 .
## ns(polity2, 3)3 0.5208 1.1004 0.473 0.6370
## ns(population_growth_rate, 3)1 7.9876 1.5447 5.171 1.15e-06 ***
## ns(population_growth_rate, 3)2 4.3293 4.7997 0.902 0.3692
## ns(population_growth_rate, 3)3 0.9468 3.0302 0.312 0.7553
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.22 on 103 degrees of freedom
## Multiple R-squared: 0.2719, Adjusted R-squared: 0.2154
## F-statistic: 4.809 on 8 and 103 DF, p-value: 4.782e-05
confint(mod2)
## 2.5 % 97.5 %
## (Intercept) -5.86151803 6.9727103
## as.factor(US_aid_total_C)1 -2.82418225 0.6900471
## log(recipient_trade_world) 0.01885467 0.7832220
## ns(polity2, 3)1 -3.44925595 2.1109830
## ns(polity2, 3)2 -13.72177772 0.5395028
## ns(polity2, 3)3 -1.66154435 2.7031677
## ns(population_growth_rate, 3)1 4.92409059 11.0510462
## ns(population_growth_rate, 3)2 -5.18963437 13.8483210
## ns(population_growth_rate, 3)3 -5.06298759 6.9565041
As we switch to using the total amount of U.S aid, we can see that when the total aid is smaller than 250, the GDP growth rate would be expected to increaseby 0.5556 percent holding all else constant in 2007; however, if the aid is greater than 250, the GDP growth rate would be expected to decrease by 0.5115 percent holding all else constant in 2007. Yet, both benefits are insignificant. The results are consistent with our findings that use U.S ODA as the treatment.
A common robust test is to see how an unmeasured confounder could affect the result. We recognize that the effect of foreign aid on recipient's economic growth can go through complicated paths, in addition to the ones we showed above. However, due to the lack of data and the complexity of the topic, our causal graph couldn't capture all the plausible causal relationships. Therefore, we particular need to assume the existence of an unmeasured confounder and see how it would affect our prior result.
Potential unmeasured variables could include recipient's historical development, recipient's national diplomacy, the natrual resources the land of the recipient possesses...etc.
First, we need to model the treatment US_ODA_C. To do so, we would first want to visualize the association between U.S aid and other controlled variables. The goal of visualization is to find out if the association is linear or not and if we should introduce the nonlinearity into our model. For all graphs below, blue lines are dervied from the actual data, providing insights on how a variable affects if one receives the treatment or not. The red lines are used to depict that relationship.
Since the absolute value of trade balance of recipient countries is a relative big number compared to the foreign aid, we use the natural log of the trade balance so that we are able to see numbers on the same scale, so our analysis would not be misled by the big numbers. In this graph, we can see that the natural log of trade balance of recipient countries is non-linear.
Similarly, putting population on log scale helps fitting the model.
4.1. US_ODA_C~log(recipient_trade_world)
ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = US_ODA_C)) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,2), method = "glm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
4.2. US_ODA_C~polity2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
4.3. US_ODA_C~population
ggplot(data = aid2012, aes(x = log(population), y = US_ODA_C))+
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,2), method = "glm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
In this graph, we can see that association between U.S ODA and politics score is non-linear; therefore, this variable should also be non-linear in our model
sensitivity_analysis <- function(.data, model_A, model_Y, assoc_A, assoc_Y) {
n <- nrow(.data)
# Obtain residuals with residuals()
# Obtain residual variances with sigma()
res_A <- residuals(model_A)
res_var_A <- sigma(model_A)^2
res_Y <- residuals(model_Y)
res_var_Y <- sigma(model_Y)^2
# Compute the mean and variance of U given A and Y
mean_U_term1 <- (assoc_A/res_var_A)*res_A
mean_U_term2 <- (((res_var_A - assoc_A^2)*assoc_Y)/(res_var_A*res_var_Y))*res_Y
mean_U <- mean_U_term1 + mean_U_term2
var_U_term1 <- (res_var_A - assoc_A^2)/(res_var_A*res_var_Y)
var_U_term2 <- res_var_Y - assoc_Y^2 + ((assoc_A*assoc_Y)^2)/res_var_A
var_U <- var_U_term1*var_U_term2
# Simulate U and add it to the data
U <- rnorm(n, mean = mean_U, sd = sqrt(var_U))
.data$U <- U
########################################################################
# The part below is the only part you need to change to implement
# the sensitivity analysis in a new context.
# Refit model to estimate the causal effect
updated_model <- lm(GDP_total_growth_rate~ US_ODA_C +log(recipient_trade_world)+polity2+log(population_growth_rate)+U, data = .data)
# The names of the coefficients and confidence interval output rows
# are called "A" for the treatment variable A.
# This will change in a new dataset.
list(c(
estimate = unname(coefficients(updated_model)["US_ODA_C"]),
ci_95_lower = confint(updated_model)["US_ODA_C",1],
ci_95_upper = confint(updated_model)["US_ODA_C",2]
))
}
# Begin the sensitivity analysis
# Fit required models for the sensitivity analysis
mod_A <- lm(US_ODA_C ~ ns(polity2,3)+ns(log(recipient_trade_world),2)+ns(log(population),2), data = aid2012)
mod_Y <- lm(GDP_total_growth_rate~ US_ODA_C +log(recipient_trade_world)+polity2+log(population_growth_rate), data = aid2012)
## Warning in log(population_growth_rate): NaNs produced
summary(mod_A)
##
## Call:
## lm(formula = US_ODA_C ~ ns(polity2, 3) + ns(log(recipient_trade_world),
## 2) + ns(log(population), 2), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56039 -0.18527 -0.07231 0.04120 0.91509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4016 0.2581 -1.556 0.122701
## ns(polity2, 3)1 0.0832 0.1462 0.569 0.570527
## ns(polity2, 3)2 0.4664 0.3786 1.232 0.220775
## ns(polity2, 3)3 -0.1759 0.1120 -1.570 0.119378
## ns(log(recipient_trade_world), 2)1 0.3358 0.4278 0.785 0.434178
## ns(log(recipient_trade_world), 2)2 -0.4180 0.2174 -1.923 0.057171 .
## ns(log(population), 2)1 0.5985 0.5008 1.195 0.234696
## ns(log(population), 2)2 1.0996 0.2937 3.744 0.000298 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3344 on 104 degrees of freedom
## Multiple R-squared: 0.2302, Adjusted R-squared: 0.1784
## F-statistic: 4.442 on 7 and 104 DF, p-value: 0.0002351
summary(mod_Y)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ US_ODA_C + log(recipient_trade_world) +
## polity2 + log(population_growth_rate), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4973 -2.0910 -0.2106 1.8263 11.0633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.73638 2.57299 3.395 0.000984 ***
## US_ODA_C -1.44675 0.92882 -1.558 0.122483
## log(recipient_trade_world) 0.44822 0.17169 2.611 0.010426 *
## polity2 -0.02564 0.06562 -0.391 0.696791
## log(population_growth_rate) 1.87865 0.47218 3.979 0.000131 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.319 on 100 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.1881, Adjusted R-squared: 0.1556
## F-statistic: 5.791 on 4 and 100 DF, p-value: 0.0003111
Since the median residule for mod_A is -0.07231 and for mod_Y is -0.2106, we assume the unmeasure variable should have a positive impact on the outcome.
The black line denotes the ACE(US_ODA that's larger than 250) under the specified U -> Y and U -> A assumption, and the grey in background indicates the range of ACE of the 95% CI. The blue line denotes the original ACE(without taking U into account).
US_ODA has on GDP_total_growth_rate is below 0, which means there is a strong evidence that when US_ODAis larger than 250, the growth in the recipient's contry is not as effective as US_ODA that's smaller or equal to 250. In other words, the hypothetical U in this case is unlikely to qualitatively change our ACE.This is counterintuitive, because usually we would assume more donation results in more obvious economic growth in the recipient's country. A potential explanation we come up with is that poorer pre-aid condition of the recipient results in more donation, but the donation might not be used effectively due to the poor pre-aid condition. For example, according to prior knowledge, the contries receiving the most aid are Iraq, Jordan, Syria, where the political situation is not very stable. We think this might account for the counterintuitive "more donation --> less ideal" effect.
Then what effect could U have? Within each U –> Y assumptioin, the various effect magnitude U –> A doesn't seem to change the generally less ideal effect of large donation comparing to small donation.
Within each U –> A, in general, the stronger the hypothetical effect of U –> Y, the less negative effect the large donation has on the GDP growth, but the trend is not obvious and not very consistent along with the increment maginitude of U –>A.
Therefore, our estimated ACE is quite robust even after considering a various causal flow that passes U, but insignificant.
ggplot(data = aid2007, aes(x = US_ODA, y = GDP_total_growth_rate))+
geom_point()+
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
According to the graph, we might need to cut US_ODA at around 100 because there is an increasing trend before 100.
aid2007 <- aid2007%>%
mutate(US_ODA_C = ifelse(US_ODA<100, 0,1))
p4<- ggplot(data = aid2007, aes(x = population_growth_rate, y = GDP_total_growth_rate)) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
p5<- ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = GDP_total_growth_rate)) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,2), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
p6<- ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate)) +
geom_point() +
geom_smooth(se = FALSE, color = "blue")+
geom_smooth(formula = y~ns(x,3), method = "lm",
method.args = list(family="binomial"),
se = FALSE, color = "red" )
library("gridExtra")
grid.arrange(p4,p5,p6)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...) :
## extra argument 'family' will be disregarded
mod3 <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2007)
summary(mod3)
##
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world),
## 2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2007)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6617 -1.7567 -0.2156 2.0653 17.5414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.49439 3.18600 2.666 0.00885 **
## as.factor(US_ODA_C)1 1.04743 0.89680 1.168 0.24539
## ns(log(recipient_trade_world), 2)1 6.18904 2.90626 2.130 0.03548 *
## ns(log(recipient_trade_world), 2)2 0.03302 1.62126 0.020 0.98379
## ns(polity2, 3)1 -2.03635 1.58750 -1.283 0.20233
## ns(polity2, 3)2 -7.93164 4.07153 -1.948 0.05400 .
## ns(polity2, 3)3 -0.19390 1.25375 -0.155 0.87738
## ns(population_growth_rate, 3)1 -3.21362 1.58371 -2.029 0.04490 *
## ns(population_growth_rate, 3)2 -2.71532 4.72539 -0.575 0.56674
## ns(population_growth_rate, 3)3 1.90060 2.20009 0.864 0.38957
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.784 on 108 degrees of freedom
## Multiple R-squared: 0.1659, Adjusted R-squared: 0.09637
## F-statistic: 2.386 on 9 and 108 DF, p-value: 0.01667
confint(mod3)
## 2.5 % 97.5 %
## (Intercept) 2.1791863 14.80958767
## as.factor(US_ODA_C)1 -0.7301829 2.82503512
## ns(log(recipient_trade_world), 2)1 0.4283341 11.94974683
## ns(log(recipient_trade_world), 2)2 -3.1805861 3.24663212
## ns(polity2, 3)1 -5.1830531 1.11035211
## ns(polity2, 3)2 -16.0021293 0.13884684
## ns(polity2, 3)3 -2.6790615 2.29125322
## ns(population_growth_rate, 3)1 -6.3528151 -0.07443251
## ns(population_growth_rate, 3)2 -12.0818574 6.65121774
## ns(population_growth_rate, 3)3 -2.4603566 6.26155491
When we use the data from 2007, we can see that if the U.S ODA is smaller than 100, the GDP growth rate is expected to increase by 8.49439 percent while we holding other variables as constant; while if the U.S ODA is greater than 100, the GDP growth rate of recipients would be expected to increase by 9.54182 percent holding other variable constant. However, the benefits of U.S ODA (<100) is significant, which is different from our previous results using data from 2012. Also, the additional benefit of larger U.S ODA is also not significant; therefore, we can not conclude that the more the U.S ODA, the greater economic impact it can generate.
It also makes sense that our results differ across different years because the marginal benefit of additional foreign aid is very likely to be different. Since the marginal benefit can differ, then the estimated contribution to the economics should also be different for each year.
mod4 <- lm(GDP_per_growth_rate ~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod4)
##
## Call:
## lm(formula = GDP_per_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world),
## 2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2012)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38535 -0.03160 0.00520 0.04228 0.18855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03682 0.07467 0.493 0.62299
## as.factor(US_ODA_C)1 0.06235 0.02338 2.667 0.00891 **
## ns(log(recipient_trade_world), 2)1 0.00185 0.06449 0.029 0.97717
## ns(log(recipient_trade_world), 2)2 -0.01505 0.03396 -0.443 0.65855
## ns(polity2, 3)1 -0.04360 0.03377 -1.291 0.19960
## ns(polity2, 3)2 -0.13387 0.08796 -1.522 0.13112
## ns(polity2, 3)3 -0.01737 0.02673 -0.650 0.51735
## ns(population_growth_rate, 3)1 -0.01401 0.03731 -0.375 0.70815
## ns(population_growth_rate, 3)2 0.01591 0.11588 0.137 0.89110
## ns(population_growth_rate, 3)3 -0.09342 0.07391 -1.264 0.20911
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07755 on 102 degrees of freedom
## Multiple R-squared: 0.1028, Adjusted R-squared: 0.02366
## F-statistic: 1.299 on 9 and 102 DF, p-value: 0.2467
confint(mod4)
## 2.5 % 97.5 %
## (Intercept) -0.11128757 0.18492910
## as.factor(US_ODA_C)1 0.01597192 0.10872197
## ns(log(recipient_trade_world), 2)1 -0.12606828 0.12976876
## ns(log(recipient_trade_world), 2)2 -0.08240852 0.05230651
## ns(polity2, 3)1 -0.11058279 0.02338367
## ns(polity2, 3)2 -0.30834189 0.04059996
## ns(polity2, 3)3 -0.07038620 0.03565296
## ns(population_growth_rate, 3)1 -0.08801248 0.06000037
## ns(population_growth_rate, 3)2 -0.21395049 0.24576214
## ns(population_growth_rate, 3)3 -0.24002735 0.05317901
We also switch to another common estimator of economic growth: GDP per capita growth rate. The coefficient of U.S ODA is still not significant and the magnitude remains small; but the only difference is that when U.S ODA is over 250, there is insignificant benefit compared to a U.S ODA smaller than 250. Our results are consistent with the previous findings using GDP growth rate.
In our model, we only used data in a particular year, which might not be persuasive or representative enough for us to make strong conclusion on the impact of U.S aid. For the future study, we might want to consider using a panel dataset and a time series model, such as the mixed effect model; then we might be able to generalize our conclusion on the impact of U.S aid.
We also know that there are various type of foreign aid, ODA and OOF. Also, within ODA and OOF, there are various type of aids as well. Some are goverment grants, and others might be loans. Since there are different forms of foreign aids and their characteristics might generate different impact on the economics of the recipient countries. Therefore, we might also want to examine the impact of various types of foreign aids, which can lead us to an alternative answer.
We used causal discovery to explore various forms of causal paths and graphs; however, we did not build our regression based on different conditional sets generated through causal discovery. Therefore, for future study, we can have different sets of conditional variables according to causal discovery and build regression models for each of those.